214 research outputs found

    A low-power integrated smart sensor with on-chip real-time image processing capabilities

    Get PDF
    A low-power, CMOS retina with real-time, pixel-level processing capabilities is presented. Features extraction and edge-enhancement are implemented with fully programmable 1D Gabor convolutions. An equivalent computation rate of 3 GOPS is obtained at the cost of very low-power consumption ( W per pixel), providing real-time performances ( microseconds for overall computation, ). Experimental results from the first realized prototype show a very good matching between measures and expected outputs

    Fetal Heart: un progetto sperimentale

    Get PDF
    2009-06-29Sardegna DistrICT: risultati di due anni di lavoro e progetti per il futur

    Automated power gating methodology for dataflow-based reconfigurable systems

    Get PDF
    Modern embedded systems designers are required to implement efficient multi-functional applications, over portable platforms under strong energy and resources constraints. Automatic tools may help them in challenging such a complex scenario: to develop complex reconfigurable systems while reducing time-to-market. At the same time, automated methodologies can aid them to manage power consumption. Dataflow models of computation, thanks to their modularity, turned out to be extremely useful to these purposes. In this paper, we will demonstrate as they can be used to automatically achieve power management since the earliest stage of the design flow. In particular, we are focussing on the automation of power gating. The methodology has been evaluated on an image processing use case targeting an ASIC 90 nm CMOS technology

    Mutual Impact between Clock Gating and High Level Synthesis in Reconfigurable Hardware Accelerators

    Get PDF
    With the diffusion of cyber-physical systems and internet of things, adaptivity and low power consumption became of primary importance in digital systems design. Reconfigurable heterogeneous platforms seem to be one of the most suitable choices to cope with such challenging context. However, their development and power optimization are not trivial, especially considering hardware acceleration components. On the one hand high level synthesis could simplify the design of such kind of systems, but on the other hand it can limit the positive effects of the adopted power saving techniques. In this work, the mutual impact of different high level synthesis tools and the application of the well known clock gating strategy in the development of reconfigurable accelerators is studied. The aim is to optimize a clock gating application according to the chosen high level synthesis engine and target technology (Application Specific Integrated Circuit (ASIC) or Field Programmable Gate Array (FPGA)). Different levels of application of clock gating are evaluated, including a novel multi level solution. Besides assessing the benefits and drawbacks of the clock gating application at different levels, hints for future design automation of low power reconfigurable accelerators through high level synthesis are also derived

    automatic detection of complete and measurable cardiac cycles in antenatal pulsed wave doppler signals

    Get PDF
    Abstract Background and objective Pulsed-wave Doppler (PWD) echocardiography is the primary tool for antenatal cardiological diagnosis. Based on it, different measurements and validated reference parameters can be extracted. The automatic detection of complete and measurable cardiac cycles would represent a useful tool for the quality assessment of the PWD trace and the automated analysis of long traces. Methods This work proposes and compares three different algorithms for this purpose, based on the preliminary extraction of the PWD velocity spectrum envelopes: template matching, supervised classification over a reduced set of relevant waveshape features, and supervised classification over the whole waveshape potentially representing a cardiac cycle. A custom dataset comprising 43 fetal cardiac PWD traces (174,319 signal segments) acquired on an apical five-chamber window was developed and used for the assessment of the different algorithms. Results The adoption of a supervised classifier trained with the samples representing the upper and lower envelopes of the PWD, with additional features extracted from the image, achieved significantly better results (p Conclusions The results reveal excellent detection performance, suggesting that the proposed approach can be adopted for the automatic analysis of long PWD traces or embedded in ultrasound machines as a first step for the extraction of measurements and reference clinical parameters

    ALOHA: A Unified Platform-Aware Evaluation Method for CNNs Execution on Heterogeneous Systems at the Edge

    Get PDF
    CNN design and deployment on embedded edge-processing systems is an error-prone and effort-hungry process, that poses the need for accurate and effective automated assisting tools. In such tools, pre-evaluating the platform-aware CNN metrics such as latency, energy cost, and throughput is a key requirement for successfully reaching the implementation goals imposed by use-case constraints. Especially when more complex parallel and heterogeneous computing platforms are considered, currently utilized estimation methods are inaccurate or require a lot of characterization experiments and efforts. In this paper, we propose an alternative method, designed to be flexible, easy to use, and accurate at the same time. Considering a modular platform and execution model that adequately describes the details of the platform and the scheduling of different CNN operators on different platform processing elements, our method captures precisely operations and data transfers and their deployment on computing and communication resources, significantly improving the evaluation accuracy. We have tested our method on more than 2000 CNN layers, targeting an FPGA-based accelerator and a GPU platform as reference example architectures. Results have shown that our evaluation method increases the estimation precision by up to 5× for execution time, and by 2\times for energy, compared to other widely used analytical methods. Moreover, we assessed the impact of the improved platform-awareness on a set of neural architecture search experiments, targeting both hardware platforms, and enforcing 2 sets of latency constraints, performing 5 trials on each search space, for a total number of 20 experiments. The predictability is improved by 4\times , reaching, with respect to alternatives, selection results clearly more similar to those obtained with on-hardware measurements

    Modelling and Automated Implementation of Optimal Power Saving Strategies in Coarse-Grained Reconfigurable Architectures

    Get PDF
    This paper focuses on how to efficiently reduce power consumption in coarse-grained reconfigurable designs, to allow their effective adoption in heterogeneous architectures supporting and accelerating complex and highly variable multifunctional applications. We propose a design flow for this kind of architectures that, besides their automatic customization, is also capable of determining their optimal power management support. Power and clock gating implementation costs are estimated in advance, before their physical implementation, on the basis of the functional, technological, and architectural parameters of the baseline design. Experimental results, on 90 and 45 nm CMOS technologies, demonstrate that the proposed approach guides the designer towards optimal implementation
    corecore